Goto

Collaborating Authors

 bad sample


MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding

Deng, Jingyuan, Yang, Yujiu

arXiv.org Artificial Intelligence

Large vision-language models (LVLMs) have shown remarkable performance in visual-language understanding for downstream multimodal tasks. While their capabilities are improving, problems emerge simultaneously. Among those problems, the hallucinations have attracted much attention, which stands for the phenomenon where LVLMs generate contradictory content to their input visual and text contents. Many approaches have been proposed to deal with this issue, such as contrastive decoding and attention manipulation. However, contrastive decoding methods struggle in constructing appropriate contrastive samples, and attention manipulation methods are highly sensitive, lacking stability. In this work, we propose image head Masked Contrastive Decoding (MaskCD). Our approach utilizes the "image heads" in LVLMs, masking them to construct contrastive samples for contrastive decoding. We evaluated MaskCD on LLaVA-1.5-7b and Qwen-VL-7b, using various benchmarks such as CHAIR, POPE, AMBER and MME. The results demonstrate that MaskCD effectively alleviates the phenomenon of hallucinations and retains the general capabilities of LVLMs. Corresponding resources could be found at: https://github.com/Deng-Jingyuan/MaskCD .


Enhanced Generative Model Evaluation with Clipped Density and Coverage

Salvy, Nicolas, Talbot, Hugues, Thirion, Bertrand

arXiv.org Machine Learning

Although generative models have made remarkable progress in recent years, their use in critical applications has been hindered by their incapacity to reliably evaluate sample quality. Quality refers to at least two complementary concepts: fidelity and coverage. Current quality metrics often lack reliable, interpretable values due to an absence of calibration or insufficient robustness to outliers. To address these shortcomings, we introduce two novel metrics, Clipped Density and Clipped Coverage. By clipping individual sample contributions and, for fidelity, the radii of nearest neighbor balls, our metrics prevent out-of-distribution samples from biasing the aggregated values. Through analytical and empirical calibration, these metrics exhibit linear score degradation as the proportion of poor samples increases. Thus, they can be straightforwardly interpreted as equivalent proportions of good samples. Extensive experiments on synthetic and real-world datasets demonstrate that Clipped Density and Clipped Coverage outperform existing methods in terms of robustness, sensitivity, and interpretability for evaluating generative models.


Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples

Neural Information Processing Systems

We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm that materially improves results with no increase in computational cost. Through experiments on many different GAN variants, we show that thistop-k update' procedure is a generally applicable improvement. In order to understand the nature of the improvement, we conduct extensive analysis on a simple mixture-of-Gaussians dataset and discover several interesting phenomena. Among these is that, when gradient updates are computed using the worst-scoring batch elements, samples can actually be pushed further away from the their nearest mode. We also apply our method to state-of-the-art GAN models including BigGAN and improve state-of-the-art FID for conditional generation on CIFAR-10 from 9.21 to 8.57.


Review for NeurIPS paper: Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples

Neural Information Processing Systems

Additional Feedback: Post rebuttal I thank the authors for the response. I agree with all reviewers that the method is simple, effective, and generalized to improve GAN methods. However, I think the rebuttal does not address well all my concerns, some of them remain: 1. Regarding the novelty of the paper, using the D scores as the feedback to improve the generator quality is not new. What is new in the paper is the simple way how to use the discriminator scores. However, quite missing substantial discussion and comparison to related works to understand the advantages of the proposed method over the existing works, e.g., stronger improvements or better training time, etc? [*] Metropolis-Hastings Generative Adversarial Networks [**] Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling 2. The inconsistency of u value in the experiment is not addressed in the rebuttal.


Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples

Neural Information Processing Systems

We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm that materially improves results with no increase in computational cost. Through experiments on many different GAN variants, we show that thistop-k update' procedure is a generally applicable improvement. In order to understand the nature of the improvement, we conduct extensive analysis on a simple mixture-of-Gaussians dataset and discover several interesting phenomena. Among these is that, when gradient updates are computed using the worst-scoring batch elements, samples can actually be pushed further away from the their nearest mode. We also apply our method to state-of-the-art GAN models including BigGAN and improve state-of-the-art FID for conditional generation on CIFAR-10 from 9.21 to 8.57.


Understanding and Improving Virtual Adversarial Training

Kim, Dongha, Choi, Yongchan, Kim, Yongdai

arXiv.org Machine Learning

In semi-supervised learning, virtual adversarial training (VAT) approach is one of the most attractive method due to its intuitional simplicity and powerful performances. VAT finds a classifier which is robust to data perturbation toward the adversarial direction. In this study, we provide a fundamental explanation why VAT works well in semi-supervised learning case and propose new techniques which are simple but powerful to improve the VAT method. Especially we employ the idea of Bad GAN approach, which utilizes bad samples distributed on complement of the support of the input data, without any additional deep generative architectures. We generate bad samples of high-quality by use of the adversarial training used in VAT and also give theoretical explanations why the adversarial training is good at both generating bad samples. An advantage of our proposed method is to achieve the competitive performances compared with other recent studies with much fewer computations. We demonstrate advantages our method by various experiments with well known benchmark image datasets.


Learning with Bad Training Data via Iterative Trimmed Loss Minimization

Shen, Yanyao, Sanghavi, Sujay

arXiv.org Machine Learning

In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted. We first make a simple observation: in a variety of such settings, the evolution of training accuracy (as a function of training epochs) is different for clean and bad samples. Based on this we propose to iteratively minimize the trimmed loss, by alternating between (a) selecting samples with lowest current loss, and (b) retraining a model on only these samples. We prove that this process recovers the ground truth (with linear convergence rate) in generalized linear models with standard statistical assumptions. Experimentally, we demonstrate its effectiveness in three settings: (a) deep image classifiers with errors only in labels, (b) generative adversarial networks with bad training images, and (c) deep image classifiers with adversarial (image, label) pairs (i.e., backdoor attacks). For the well-studied setting of random label noise, our algorithm achieves state-of-the-art performance without having access to any a-priori guaranteed clean samples.